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FOREWORD 


This  paper  contains  the  mathematical  validation  of  a  Lagran- 
gian  technique  for  nonlinear  programming  that  replaces  the  original 
problem  by  an  auxiliary  problem  that  is  solvable  by  standard 
methods.  The  properties  of  the  auxiliary  problem  are  described 
and  validated  under  certain  conditions,  and  a  number  of  applica¬ 
tions  are  described.  Several  examples  are  presented  in  order  to 
clarify  various  results. 

Some  parts  of  this  paper  have  been  extracted  from  the  author’s 
Ph.D.  thesis  in  mathematics  at  the  University  of  Michigan.  The 
writing  of  this  thesis  was  supported  by  National  Science  Founda¬ 
tion  Grant  GP-2215. 

The  author  solicits  criticisms,  questions,  and  discussion  of 
any  of  the  conclusions  reached. 
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Theory  of  Lagrange  Multipliers 
for  Constrained  Optimization  Problems 


ABSTRACT 


This  paper  treats  an  extension  of  one  version  of  the  classical  Lagrange 
multiplier  rule  as  applied  to  nonlinear  programming  problems.  For  a 
given  problem,  an  auxiliary  problem  is  defined  and  its  properties  are 
studied  under  various  assumptions.  In  particular,  when  the  given  problem 
has  a  strictly  convex  objective  function  and  concave  constraints  it  is  shown 
that  the  auxiliary  problem  is  one  of  maximizing  a  concave  differentiable 
function  over  an  open  set  subject  only  to  nonnegativity  conditions.  Some 
applications  of  this  theory  are  presented  along  with  the  connection  be¬ 
tween  the  auxiliary  problem  and  a  “dual”  of  the  given  problem. 
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1.  INTRODUCTION 


Lagrange  multipliers,  in  one  form  or  another,  have  played  an  important 
role  in  the  recent  development  of  nonlinear  programming  theories.  Indeed, 
perhaps  the  most  important  theoretical  result  in  this  field  to  date  is  the  cele¬ 
brated  Kuhn-Tucker  Theorem,1  which  is  an  extension  of  the  classical  Lagrange 
multiplier  rule  in  its  most  common  form  (see  Courant  and  Hilbert,2  p  165).  In 
the  same  paper,  Kuhn  and  Tucker  show  the  equivalence  between  convex  pro¬ 
grams  and  their  associated  “saddle  value”  problems. 

Related  to  these  concepts  are  the  variations  of  the  dual  program  formu¬ 
lated  by  Wolfe,3  Husra,4  and  several  others.  This  duality  theory  for  nonlinear 
programming  received  impetus  from  its  counterpart  in  linear  programming, 
where  it  enjoys  a  very  pleasing  and  useful  symmetry.  Early  formulations  of 
the  dual  ol  a  nonlinear  program  did  not  enjoy  perfect  symmetry  (for  example, 
the  dual  of  a  convex  program  was  not  convex),  and  attempts  to  achieve  it  led 
to  a  closer  study  of  the  properties  of  Lagrangian  functions  (see  Rockafellar5 
and  Whinston6  and  their  references). 

A  study  of  the  Lagrangian  function  of  a  problem  has  proved  useful  from 
a  computational  standpoint.  For  example,  Everett7  has  presented  an  interest¬ 
ing  result  that  applies  to  general  problems  involving  separable  objective  func¬ 
tions  and  constraints.  The  method  essentially  involves  an  iteration  scheme  in 
the  space  of  Lagrange  multipliers  together  with  comparatively  simple  minimi¬ 
zation  operations  at  each  iteration.  Although  ;.t  is  clear  how  these  minimization 
operations  are  to  be  performed,  it  is  not  clear  how  the  optimal  set  of  Lagrange 
multipliers  are  to  be  chosen. 

Most  of  the  work  in  this  field  has  emphasized  the  best-known  formulation 
of  the  Lagrange  multiplier  rule.  There  is  another  formulation  (Ref  2,  pp  231- 
32)  based  on  the  Legendre  transformation  that  states  the  equivalence  of  a  given 
equality  constrained  problem  with  a  related  but  unconstrained  optimization  prob¬ 
lem.  The  main  purpose  of  this  paper  is  to  generalize  this  version  of  the  Lagrange 
multiplier  rule  to  handle  inequality  as  well  as  equality  constraints  and  to  de¬ 
scribe  the  structure  of  the  related  problem  in  some  detail.  It  will,  in  fact,  be 
shown  that  often  a  great  deal  of  the  structure  of  this  related  problem  can  be 
exploited  computationally. 

Section  2  conlains  the  definitions  of  the  various  constituents  of  the  related 
or  auxiliary  problem.  These  definitions  can  be  made  without  reference  to  any 
particular  hypothesis  on  the  elements  of  the  given  problem,  and  some  results 
may  be  obtained  in  this  general  setting. 

In  Sec  3  the  discussion  includes  only  convex  programs  with  strictly  con¬ 
vex  objective  functions.  No  differentiability  assumptions  are  necessary.  Al¬ 
though  many  of  the  results  of  this  section  hold  for  less  restricted  problems, 
the  assumption  of  strict  convexity  seems  to  be  the  most  concise  and  common 
hypothesis  that  can  be  made  to  ensure  that  the  auxiliary  problem  is  well  behaved. 
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With  these  restrictions  it  will  be  shown  that  the  auxiliary  problem  becomes 
one  of  maximizing  a  concave  differentiable  function  over  an  open  set  subject 
only  to  nonnegativity  conditions.  This  would  appear  to  be  a  simple  and  useful 
procedure  computationally,  since  any  standard  gradient-a3cent  technique  could 
theoretically  be  employed  on  the  auxiliary  problem  to  obtain  a  solution  of  the 
given  problem.  Unfortunately  the  calculation  of  the  gradient  of  the  objective 
function  01  the  auxiliary  problem  involves  the  solution  of  a  nonlinear  program, 
and,  unless  the  given  problem  has  a  special  structure,  this  solution  may  re¬ 
quire  an  excessive  amount  of  effort.  On  the  other  hand,  many  problems  do 
have  this  special  structure  (e.g.,  separable  programs)  and  for  these  problems 
the  solution  of  the  aforementioned  nonlinear  problem  is  easy.  Lasdon,8  Taka- 
hashi,9  and  Falk10  have  investigated  such  decomposable  problems. 

Takahashi  views  the  auxiliary  problem  as  the  conjugate  of  a  second  related 
problem  and  uses  known  results  of  conjugate  functions  to  verify  his  results. 
Although  the  theorems  are  stated  correctly  the  proofs  are  incomplete  since 
questions  concerning  the  convexity  of  the  domains  of  the  functions  involved 
are  ignored. 

In  Sec  4  the  auxiliary  problem  is  related  to  the  dual  of  the  given  problem 
as  defined  by  Wolfe,3  and  it  is  shown  that  the  two  problems  are  essentially 
equivalent.  This  is  important  since  the  auxiliary  problem  is  a  convex  program 
whereas  the  Walfe  dual  generally  is  not. 

Also  in  Sec  4  the  theory  is  applied  to  decomposable  and  separable  programs 
and  to  the  problem  of  minimizing  a  quotient  of  two  functions. 


2.  THE  GENERAL  CASE 

The  mathematical  program  to  be  discussed  has  the  form 
minimize 


kS(x)  :  fUU  0.  x  iCI  (1) 

where  C  is  a  subset  of  En  and  where  0  :  EM  -  E1  and  f  :  En  -  Em.  In  general, 
for  a  given  problem  there  are  many  ways  to  partition  the  constraining  inequali¬ 
ties  (or  equalities),  and  hence  the  selection  of  a  particular  f  and  C  in  Eq  1  is 
somewhat  arbitrary.  Computational  considerations  discussed  in  Sec  4  indicate 
which  constraints  should  be  represented  by  f  and  which  snould  be  represented 
by  C.  It  is  assumed  that  m  *  1. 

Equality  constraints  have  not  been  included  explicitly  in  order  to  simplify 
later  notation.  Their  inclusion  would  cause  no  theoretical  problems  as  ail  the 
results  that  follow  hold  in  their  presence.11 

The  definitions  that  follow  can  be  made  without  any  additional  hypothesis 
on  0,  f,  or  c. 

The  Lagrangian  function  of  Eq  1  is  defined  on  E"  x  Em  by  the  relation 

AU.ii)  <Mx)  -  ■  u.  JU)  *.  (2) 


A  function  y  is  defined  over  its  domain  D[y]  by  means  of  the  relations 
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D [yj  =  iu  >  0  ;  y(- ,  u)  attains  its  minimum  ov**r  Cl 


(3) 

(4) 


y(u)  -  min  |A(x,  w)  :  x  eCl 

Thus  a  is  in  the  domain  of  y  if  and  only  if  the  function  X  (*,  u)  attains  a  finite 
(absolute)  minimum  at  some  finite  point  x .  The  totality  of  points  in  C  that 
minimize  X (•,  u)  for  a  given  v :  €  D  [y]  is  denoted  by  X(u)  and  will  be  termed 
the  “minimizing  function.”  In  general,  X(u)  is  a  set  function  defined  over  D[y] 
into  C.  The  function  y  will  be  termed  the  “auxiliary  function”  of  Eq  1,  and  the 
problem 

max  |y(u)  :  U  €  D t y  1 1  (5) 

will  be  termed  the  “auxiliary  problem”  of  Eq  1.  These  definitions  may  be  illus¬ 
trated  by  an  example. 

Example  1  (See  Fig.  1) 


y 


o.  Feasible  Region  b  Tbe  Auxiliary  Function 


Fig.  1— Example  1 


Minimize 

subject  to 


cs(*>  -yxj  ♦  2i,  + 

fix)  ‘  x,  -x2  -  2L'  0, 
*,.x2  -  0, 
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Since  each  of  these  one-dimensional  minimizations  can  be  carried  out  for  all 
values  of  u  2  0  it  follows  that 

DH-(£V 


Calculating  the  derivatives  of  the  functions  involved  in  tills  expression  for  y 
and  setting  them  equal  to  zero  gives 


(  0 
Xjtu)-  2 

Oiu  <  2 

\  u  -  2 

u  >  2 

(  ° 

u>  1 

X2(ii)  *  < 

(  1-u 

°<u  <1. 

Substituting  these  expressions  into  X(x,ii)  gives 

l  -£ii2  +  3u--£  0  <u  ^  I 

y(u) «  <  2  u  1  <  u  £  2 

f  +  u  >  2. 

If  the  equality  constraints  g*  (x)  -  0  (i  *  1, .  .  .  ,  p)  were  added  to  problem 
1  the  auxiliary  variables  ui  (i  =  1, .  .  .  ,p)  associated  with  these  constraints 
would  not  be  restricted  to  be  nonnegative  in  the  definition  of  D  [>]. 

It  may  be  proved  that  if  Eq  1  is  the  linear  program 


min  I (c,x  )  :  Ax  £  b,  x  Z  0| 


with  f  (x)  ■  Ax  -  b  and  C  *  {x  :  x  *  0) ,  then  Eq  5  is  precisely  its  dual  (see  Falk.11) 
However,  if  some  of  the  inequalities  described  by  f  are  used  to  describe  C,  then 
problem  5  becomes  a  “piece-wise  linear"  program. 


Example  2  (See  Fig.  2) 
Minimize 


Ij  +  5i2 


subject  to 


fj(x)  --2+  2xj  +  x2  >  o 
f2(x)  «=  -  3  +  x,  +  3x2  >  0 


C  : 


-3/2xl  4  x2  <  3 
x,  -x2  r:  2 
<1*2  =  °- 
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The  Lagrangian  function  for  this  problem  is 

A(x,u)  -  JTj  4  5x2 -u,(-2+  2ij  +i2)-»j(“3+*|  4  3x2) 

-(1-  2mj  •m2)  JT|  +(5-mj  -  3u2)  jt2  4  2u,  4  3m2 

and  the  corresponding  auxiliary  function  becomes 

yin)  •  Mia  K 1  -  2ai|  -  u2>  Xj  4  (5  -  iij  -  3ii2)  x2l  4  2it|  4  3u2- 
X€C 

The  simplex  tableau  can  be  used  to  investigate  the  three  vertices  of  C  and  to 
determine  the  corresponding  sets  of  mi  and  112  for  which  these  vertices  are 
attained.  The  results  are  given  in  Fig.  2  along  with  the  corresponding  value 
of  y(»). 


c.  TK*  Doom) in  of  y 

Fig.  2— A  Linear  Program  with  its  Auxiliary  Problem 
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If  Eq  1  is  the  quadratic  problem 
minimize 

<x,Dx)  ♦  (d.x ) 

subject  to 

Ax  >b 

where  f(r)  =  Ax  -  b  and  C  =  E"  with  D  symmetric  and  positive  definite  then 

problem  5  becomes 

maximize 

-|-<AD-1  A'u,n>  <AD-‘dtM>  -j  <  D~*  <!.<!  > 

subject  to 

u  >o. 

This  is  the  dual  problem  that  Lemke12  addresses  in  his  method  for  quadratic 
programming. 

Related  to  the  function  y  is  the  function  Y defined  by  replacing  “minimum” 
with  “infimum”  in  definitions  3  and  4  and  requiring  that  u  €  D[?]  whenever 
A(-,u)  has  a  finite  infimum  over  C.  Clearly  D[y]  c  D[y]  and  y(u)  =  y{u)  for 
each  u  6  D[y].  In  many  problems  D[y]  *  D[y]  (e.g.,  D[y]  =  D[y]  =  (E")+  if  C 
is  compact  and  if  0  and  f  are  continuous),  although  it  is  easy  to  find  examples 
where  the  strict  inclusion  holds.  One  such  example  is  constructed  by  setting 

6ix) « r12 

f«Jc)  -  x  | 

C  ■>-  E2 

Here 

y(u)  -  min  lc  *2  -uxj  I 
x\'x2 

and  this  does  not  attain  a  minimum  for  any  value  ofii  so  that  DO]  =0.  How¬ 
ever,  for  u  =  0  the  term  e*2  has  an  infimum  of  0  so  that  D[y]  -  {0}. 

This  paper  is  primarily  concerned  with  the  function  y,  although  occasion¬ 
ally  y  will  be  referred  to  in  order  to  clarify  certain  results. 

Theorem  1 

The  function  y  is  concave  over  convex  subsets  of  D[>]. 

Proof:  Fix  ul ,  u2  e  D[y],  a  *  0,  $  *  0,  a  +  0  =  1,  and  assume  that 
u3  -  ofUi  +  pu2  e  D[y],  Then 

y(uJ)  -  min  luAU.U1)  t  fi  A(x,U2)  :  xcCI 

’  >  min  t A < x,u 1  >  :  *e:C|  +  (i  min  lA(x,u2)  :  XtCl 
•  <  <»i 1 »  .  H  viu2) 

and  the  proof  is  complete. 

It  is  not  true  in  general  that  D[y]  is  convex,  even  when  Eq  1  is  a  convex 
program. 
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Example  3 
Minimize 

Iomui  S lJTgJ,  e“*2l:  *J  *0|. 

Each  of  the  functions  01 :  (xx ,i2)  -  |xj  |  and  02 :  (x! ,  x2)  -  e‘*2  is  convex. 
Since  the  maximum  of  two  convex  functions  is  convex,  it  follows  that  the  function 

4>'  (X|,X2)  -*  max  Hijl,  t  *2| 

is  convex. 

Pick 

f(x)«x, 

C-E2 

Since  the  constraint  is  an  equality,  the  value  of  u  is  unrestricted. 

By  definition 

y(u)  «  min  (max  J|ij|.  e***2l  -UXjl 

whenever  this  minimum  exists.  However,  this  minimum  exists  only  when 
ii-  1  and  u  =  -l  so  that 

D{y]  =  1—1, 1 ! 

and  is  not  convex.  It  is  interesting  to  note  that 

x(i)-Kx,,x2):xi>r*2| 

and 

X(-l)  -Kxj.Xj):  <  -  e~n\ 

are  both  unbounded  sets.  For  convex  problems  such  an  example  could  not  be 
constructed  otherwise  (see  corollary  to  Theorem  8). 

Note  that  D[y]  =  (Em)+  when  0  and  f  are  continuous  and  when  C  is  com¬ 
pact.  The  convexity  of  D[y]  will  be  established  for  a  different  class  of  problems 
in  Sec  3. 

Since  y  is  concave  over  convex  subsets  of  D[y],  it  is  differentiable  almost 
everywhere  in  int  D[y].  In  order  to  calculate  Vy  =  (dy/diij , .  .  .  ,  dy/aum)T 
when  it  exists,  it  is  necessary  to  establish  a  preliminary  result. 

Lemma  1 

Let  ii  *  €  int  D  [y]  and  assume  that  the  differential  dy(u*; .)  exists.  Let 
g  €  E«  be  such  that 

y(w*)  -  y(u)  >  (  g,u*  -  u  ) 

for  all  u  €  D  [y].  Then  v y(u  *)  «  g . 
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Since  y  is  dllterenuLiTat***  8uc!1J£lt  0<t  <  ll«  "  ?y(**)||. 

(denoted  by  N(e*;  «),  contained  In  DCyV^tLT  *  "*  *  radtas  « 

Mil*) -  (  7  yU*),  «•-„  >|  <  ,||1(_11.|| 

*  ’  wtth  “^ecuon  ,Vl.e.  S**  »  o  "*  enWnaUn«  fr0ffl 

-  «*^a(Vy(ii*)-g)€Nfu*;5). 

With  this  selection  of  „♦  it  follows  that 


so  that 


<Vy(u*)  -  g.  y*  -  u*  >  *  a||Vy(U*)^g||2 

-  Ik*-«*il  !|Vyfu*)-g|| 
>  €||y*-u*|| 

(q,  u*  -  v*)  >  (Vy(u*),  „*  _  „*)  +  £||y*  _  u*|| 


The  hypothesis  of  the  theorem,  together  with  this  Inequality,  Implies 

y(uV-yf,’Vi  <g.  u*  -  „*) 

>  (VyfuV,  U*  - 1'*)  ♦  e||v*  _  u*!^ 
which  violates  Eq  5,  and  the  proof  is  complete. 

Theorem  2 

Let  ii*  e  int  D[r]  and  assume  d  y(u*;  .)  exists.  Then 

Vy(u*)  =  -  f(x*) 

where  x*  is  any  point  in  X(u*). 

Proof: 

y(u)  -  .f(x)):x€Ci 

-  <t>(x*)-(u.  f(x*)) 

"  *,x*> -  <“*•  f(**>  >-  <«.  fdV >.  <»*,  /f!*) > 

-  Ww*)  +  (u  -  u*,  -  f(x*)) 

proo/1 Is"  complete  HenCe  ’f  ^  83118,168  the  conditlon  *  Lemma  1,  and  the 

If  y  attains  its  maximum  at  a  point  u  *  wher*  a  *,<»  ♦  .  * 
result  allows  computation  of  the  solution  of  problem  1.  ’  eXl8tB’ th®  n65rt 

Theorem  3 
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Proof:  Since  y  is  maximized  at  u*  and  is  differentiable  there  it  is  neces¬ 
sary  that 


?y(u*)  -  -  0 


where  x  *  is  any  point  in  X  (u*).  But 

y(u*)  -  <fi(x*)-<u*,f(x*)> 

1  4(*)-<u*,f(x)) 

for  all  x  €  C.  For  any  feasible  point  * ,  f  (*)  >  0  so  that  <«  *, f  (*)>  >  0.  It 
follows  that 

<f>(x*)  S  *(x) 


for  all  feasible  i,  and  the  proof  is  complete. 

Theorem  4 

If  x  and  it  are  feasible  points  of  problem  1  and  its  auxiliary  problem 
respectively,  then 


ytu)  $  t(x)- 

Proof:  Assume  x  and  u  are  feasible.  Then 

yf«)  »  min  10(2)-  <11,  f(z))  :  2  cCf 
£  *(*) 


and  the  proof  is  complete. 

The  following  examples  indicate  the  need  to  assume  more  structure  on 
0,  f ,  and  C  in  order  to  establish  a  close  relation  between  problem  1  and  its 
auxiliary  problem.  The  objective  functions  of  these  problems  are  not  convex. 

Example  4  (See  Fig.  3) 

Minimize 

I  -  jt2  :  1  -  2*  -  O.OSiSlI. 

Choosing  f(x)  =  1  -  2x  and  C  =*  [x  :  0  <  x  <  1}  it  is  found  that  D[y]  *  E1  and  y 
attains  its  maximum  at  u*  =  Vi.  However  x*  »  V* XXU*)  *  {0,1}  and  y(u *) 

<  0{x*).  Hence  both  problems  are  feasible  and  have  optimal  solutions,  but 
these  solutions  are  not  directly  related. 

Example  5  (See  Fig.  3) 

Minimize 

1  -  i2  s  1  -  2x  *  0,  x  -  0  or  11 . 
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Choosing  f  as  above  and  C  3  {0,1}  it  is  easily  seen  that  the  auxiliary  problem 
is  feasible  and  has  an  optimal  solution  whereas  the  stated  problem  is  not  feasible. 


Fig.  3— Tht  Auxiliary  Function 


Example  6 
Minimize 

I  x3  S  Jr  01 . 

Setting  f(x)  =  x  and  C  =  E1  it  follows  that  DO]  is  empty  so  that  the  auxiliary 
problem  is  not  feasible,  whereas  the  stated  problem  is  feasible  and  has  an 
optimal  solution. 

The  following  principle  has  applications  to  decomposable  and  separable 
programming  which  will  be  pointed  out  in  more  detail  in  Sec  4.  The  theorem 
is  stated  here  because  no  special  hypotheses  are  needed  on  0,f.  and  C. 

Suppose  that  problem  1  has  the  form 
minimize 

|<#x)  ♦  ¥(>)  :  f(x)  »  g(>)  X  ye  Dl.  (g) 

Let  y  denote  the  auxiliary  function  of  this  problem  and  (X ,  V )  its  minimizing 
function.  Let  y‘  and  y*  denote  the  auxiliary  functions  of  the  two  problems 
minimize 


l<5(x)  :f(x).  0.  1  <:  Cl  and 


(7) 


minimize 

1%)  :  9(y)  >  0,  y  £  Dl .  o  (Q) 

Let  X  *  and  Y*  denote  the  minimizing  functions  of  Eqs  7  and  8  respectively. 

Theorem  5 

Problems  6,  7,  and  8  are  related  in  the  following  manner: 
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(a) 

(b) 

(c) 


y  s  y'+y' 

D(yl  *  rtylOMy'] 


(X.Vj  =  (X'.Y') 

Proof:  The  theorem  is  a  direct  consequence  of  the  relation 

y{u)  =  min  l0(x)  +  ¥(y)  -  <u,  f(x)  +  gfy))  :  *  f  C,  y  <■  Dl 
=  min  l<#x)  -  (u,f(x)  >:  x  e  Cl 
+  min  l¥(y)  -  <u,  g(y)  >:  y  e  Dl . 

The  next  theorem  states  that  any  feasible  point  u  of  the  auxiliary  problem 
yields,  via  X(u),  a  solution  of  another  problem  related  to  problem  1. 

Theorem  6  (Everett)7 

If  m  *  €  D[y]  then  a  point  x  *  in  X  (u  *)  is  a  solution  of  the  problem 
minimize 

l#t) :  ffx) 2  fCs*).  ic  Cl. 


Proof:  By  definition,  x  *  €  C  and 

y(M*)  -  6(x*)-(u*.f(x*)>*4(x)-<u*.t(x )> 

for  all  x  €  C.  Since  u  *  >  0  it  follows  that 

6(x*)£&x )  (u*,f(x)-  f(x*)) 

for  all  x  €  C  and  f  (x )  >  f  (x  *). 

3.  THE  STWCTLY  CONVEX  CASE 

Unless  otherwise  stated,  throughout  this  section  it  is  assumed  that  the 
Lagrangian  function  A(*,u)  defined  in  problem  2  is  strictly  convex  for  each 
«  €  D[y].  Such  would  be  the  case,  for  example,  if  0  is  strictly  convex  and 
each  fi  is  concave.  No  differentiability  assumptions  are  required.  The  set  c 
is  assumed  to  be  closed  and  convex  but  not  necessarily  compact. 

Theorem  7 

If  A (•,«)  is  strictly  convex  for  each  u  €  D[y]  and  (  is  closed  and  convex, 
then  D[y]  is  an  open  set  relative  to  (Fm  )  \ 

Proof:  Fix  u*  €  D[y]  and  let  x  *  =  X  (u*).  [Since  A(*,  u*)  is  strictly  convex 
and  has  a  minimum  over  C,  it  must  have  a  unique  minimum.]  Let  N(x*;e)  be 
a  neighborhood  of  x*  of  radius  <  when.1  <  >  0  and  t  nN(x*;f)  y  0.  If  such  an  c 
cannot  be  found,  then  C  consists  of  the  single  point  x*  and  the  theorem  is  trivial. 
Let 

f/j  )•(**,  u*) 

fi  .  inin  I  Aft  .11*)  x  •-  C  r  ,  «-  )l 
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(The  symbol  a  denotes  the  boundary  operator.)  • 

Then  P2  >  Mi  because  X(*,  **)  Is  strictly  convex  as  a  function  of  z.  Let 


v2  *  ««xl||f(Jt)||  :.*  eCnaN(x*;£)l 

and  set 


1  if  «  u2  *  0 

O'*  —  •f*'!  *  0,»/2  »  0 

(#*2  “  if  v'j  •»  0,  Vn  4  0 

min  i(ft2  -  fC|)/3»/|,  if  mj  4  0,  v*  4  0 

0*2  “  Mi)/3^2  b 

Then,  if  ||u  -  ii*||  <  6,  u  €  D[y],  it  follows  that 

lltfz*)  -  <M,  f(x*)>l  -  \<Hx*)  -  <«*,  ffx*)>!  I 
=  |<M*-U,  ffx*)>| 
si]m*-«h  iiffx  i 
*  •'l  I|M“U*|| 

<  (ft  2  ~  f»iV3 

so  that  X(*, «)  evaluated  at  x*  differs  from  Mi  by  less  than  (ji2  ~  Mi  )/3. 
Furthermore,  if  x'  €CH  dN(x*;c)  it  follows  that 


so  that 


|A(x u*)  -  Afx h)|  -  |<m-  u*.  f(x *)> I 
s  11m  -  u*!|  llffx')|! 
$  u2\\u-u*W 
<  <P2-  #*i>/3 


Afx u*)  <  in2  ~  <  Afx u). 


Hence 


fi2  -  (h2-h])/3  <  Afx',  u). 


Now  X(-,u)  must  be  minimized  somewhere  in  C  n  N  (x  V)  since  this  set  is 
compact.  Its  minimum  cannot  occur  on  C  HdN(x*;c),  since  there  is  a  point  t  * 
in  C  fl  N(x*;c)  giving  a  lower  value  to  X(-,«)  than  any  point  on  Cn  dN(x  *;<)• 

Since  0is  strictly  convex,  the  minimum  of  X(*,u)  is  the  unique  global  minimum 
of  X(*,u)  over  C,  and  the  proof  is  complete. 

The  strict  convexity  is  essential  in  this  proof  since,  for  example,  a  linear 
program  has  a  closed  convex  polyhedron  for  the  domain  of  its  auxiliary  function. 

The  next  theorem,  together  with  Theorem  1,  shows  that  the  auxiliary  prob¬ 
lem  is  a  convex  program. 

Theorem  8 

If  X(\u)  is  strictly  convex  for  each  u  €  D[y]  and  C  is  closed  and  convex, 
then  D[y]  is  convex. 


Proof:  Fix  u1  and  u2  in  D[y]  and  let  m3  =  an1  +  flu2  where  a,  0  >  0, 
a  ♦  0  3  1.  Since 

inf  U(x,u3)  :  xeC\Z  a  inf  lAfx.u1) :  xeCl  +  0  inf  lA(x,a2)  :  xeCl 


it  follows  that  X(*,  m3)  is  bounded  below  on  C.  It  must  be  shown  that  the  greatest 
lower  bound  of  A(*,u3)  over  C  is  actually  attained  at  some  point  of  C. 

Let 

/i,  -  inf  lA|x,k‘)  :  x  e C I  (i  «  1,  2,  3) 


and  set 


H4  >  max  |fi| ,  j*2*  • 


Consider  the  sets 

L‘Qi4)  -  lx  eC  .■AfxV)  £  fi4l  (i  =  1,2,3). 


Since  XC'yU1)  and  X(*y  u2)  have  unique  minima  in  C,  it  follows  that  the  sets 
L1 0x4)  and  L20*4)  are  bounded  sets  because  all  nonempty  level  sets  of  a 
convex  function  are  bounded  in  the  same  directions. 13  But 

L30i4>  C  L1  (#i4)UL2(#i4) 

so  that  L30*4)*is  also  bounded  and  hence  compact.  Thus  x(*f  u3)  attains  its  * 
minimum  in  C,  and  the  proof  is  complete. 

Although  the  strict  convexity  of  0  is  a  sufficient  condition  for  the  con¬ 
vexity  of  D[y],  it  is  not  necessary  since  D[y]  is  convex  in  the  linear  pro¬ 
gramming  case.  Example  3  following  Theorem  1  illustrates  the  need  for  some 
condition  such  as  the  strict  convexity  of  A(*,u)  to  ensure  the  convexity  of  0[y]. 
The  corollary  that  follows  relaxes  the  strict  convexity  assumption  somewhat. 

The  “relative  interior*  of  D[y]  refers  to  the  interior  of  D[y]  with  respect 
to  the  smallest  linear  manifold  containing  D[y]. 

Corollary 

If  A(*,m)  is  convex  and  C  is  closed  and  convex  (so  that  problem  1  is  a  con¬ 
vex  program),  and  if  XO11  )  is  bounded  for  some  11 1  f.  D[y]  then  the  relative 
interior  of  D[y]  is  convex. 

Proof:  Let  u3  be  any  point  in  the  relative  interior  of  D[y],  Then 

u3  -  au1  4  0u2  (a,  0  >  0,  0(4  0  -  n 

where  u2  €  D[y]  is  on  the  ray  emanating  from  u1  and  passing  through  u3. 

Using  the  notation  of  the  above  theorem  we  obtain  as  before 

/t3  Z  a/11  +  0u2. 

Let  y*  €  X(ii2).  Then  A(*,  u2)  >  A(y*,n2)  for  all  x  €  Cj  hence 


15 


LsOis)  •=  \xeC  :  «A(i,  ul)  + /3A(x,u2)  $  A(y*,u3)  I 

*  lx  € C  :  aAfx.u1^  «A(y*,Ml)  + 0[A(y*,M2)- A(x,u2)]! 

C  |x£C  :  A(x,iil)  £  aA(y*,u,)l 

*  L|  (A(y*,u1))  which  it  «  bounded  set. 


It  Is  not  empty  since  y*  €  L3  (X(y*,u3)). 

Hence  X(-,u3)  attains  a  minimum  over  C.  Since  u3  was  an  arbitrary  point 
in  the  relative  interior  of  D[y],  the  proof  is  complete. 

The  next  theorem  categorizes  the  minimizing  function  X  in  the  strictly 
convex  case.  Note  that  X  («)  consists  of  a  single  point  for  each  u  and  hence 
may  be  considered  a  function  in  the  usual  sense. 

Theorem  9 

If  X(*,u)  is  strictly  convex  for  each  u  €  D[y]  and  C  is  closed  and  convex, 
then  X  is  a  continuous  function  on  D[y]. 

Proof:  Fix  u*  €  D[y]  and  c  >  0.  It  must  be  shown  that  there  is  a  6  >  0 
such  that  || u  -  u*||  <  6,  u  €  D[y]  implies  that  ||X  («)  -  X(ii*)||  <  c.  Set 
x *  =  X(ii  *)  and 


M  >  max  l||f(x)-f(x*)||  :  xeC  fl  dN(x*;  £)l . 


(If  C  n  d  N(x*;  c)  is  empty  the  proof  is  immediate.)  Let  6  >  0  be  any  number 
such  that 

(l/M)  I0(x)~  0  (x*)  -  <u*.  f  (x)  -  f  (x*)>l  >  5 


for  all  x  6  C  OdN(x*;c). 

If  m  €  D[y]  n  N  (u*;6)  then 

M5  >  llf(x)  -  I|m  -  m*|| 
4<f(x)  f(x*),  u  —  u* >. 


But 

l«Mx)-  <u*,f(x))l  -  <u*,f(x*))l  >  M6 

for  all  x  e  C  n  d  N(x*;c) 
so  that 

6(x)  -  (u.  f(x))  t'lfx*)- 


for  all  X  €  C  n  &N(x*;c). 

Since  u  €  !>[y]f  X(‘t  m)  has  a  minimum  over  Ct  and  this  last  inequality 
shows  that  this  minimum  cannot  occur  on  C  n  dN(x*;c).  The  strict  convexity 
of  X(*,«)  requires  that  it  be  minimized  in  C  n  N(x*;c),  and  the  proof  is  complete. 


I 


Theorem  lOt 

If  X(',u)  is  strictly  convex  for  each  u  €  D[y]  and  C  is  closed  and  convex, 
then  the  partial  derivatives  dy/dii;  exist  and  are  continuous  throughout  int  D[y], 
and  hence  y  is  differentiable  there.  Moreover,  if  u*  €  D[y]  and  si*  *  0  then  the 
right-hand  partial  dy/buf  exists  at  u*. 

Proof:  Fix  u*  €  D[y]  and  h  >  0.  By  letting  e*  denote  the  ith  unit  vector 

Y(U*  ZlWl  $  ||X(X(u*),  u*  +  he*)  -  A(X(u*),  u*)l 


On  the  other  hand 

+  >  ^|A(X(u*  +  he1),  u*  +  he1)  -  A(X(u*  +  he*,  u*))l 

=  -  f,  (X(m*  +  he*)). 

Since  ft  and  X  are  continuous,  it  follows  that 

hm  l-fj(X(«*  he1))!  -  -  fjfXfu*))  -±-|u* 
h-*0 

For  Mj  >  0  a  similar  proof  shows  that  dy/du^  |«*  *  -  ft  (X  (u  *))  and  the  proof  is 
complete. 

This  theorem,  together  with  Theorems  7  and  8,  allows  one  to  find  the 
solution  of  the  auxiliary  problem  by  employing  any  standard  gradient-ascent 
technique  that  takes  into  account  the  nonnegativity  condition  on  u  (this  last 
restriction  is  unnecessary  if  only  equality  constraints  are  present  on  the 
original  problem). 

Although  y  is  continuously  differentiable  in  int  D[y]  it  is  not,  in  general, 
twice  continuously  differentiable.  For  example,  when  the  objective  function 
and  the  constraining  functions  are  separable  and  when  C  =*  (E" )+,  the  region 
D[y]  is  partitioned  into  several  subregions  by  it  hyperplanes  (see  Ref  11,  p  92). 
The  degree  of  differentiability  of  y  inside  these  subregions  depends  primarily 
on  the  degree  of  differentiability  of  0  and  the  fj .  Typically  y  is  not  twice  con¬ 
tinuously  differentiable  on  the  common  boundaries  of  these  subregions. 


Example  7  (See  Fig.  4) 


Minimize 


subject  to 


Xt"x3  »  1 

xr  *2  *  2 

*  0. 


tThe  author  la  indebted  to  G.  P.  McCormick  for  suggesting  the  proof  presented 
here  for  this  theorem. 
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Fig.  4 — The  Domain  of  y 


*?  >  0,  *2  >  0.  >0 


*»  >0»*2  >0. 

«  0 


*f  *  0 
*2  *  0 
X3  -  o 


-  o,  XJ  >  0 


*1  >0, 
*2-0. 
*3-0 


4 . -  -»»f 


The  auxiliary  function  of  this  problem  is  defined  by 


y(v)  -  min  j-~x’  +  2x,  "•  f  *2  *  ix,»  +  Tx?  “  XxX 
xl»  x2*  x3  ^  ®  *  "*  ‘ 

-Uj(xi-x3-  \)-u2(xi  *  X2  -  o}. 


Because  of  the  separability  of  \  (x  ,u )  in  the  x  variables,  it  may  be  written  that 

y(u)  -  min  j-xj  ♦  2X|.(U|  *  uJ  x ji 
*  , 0  *  -  ’ 

-  \  ij 

4  min  lo  “  •  .‘U9  U.>x.J 
X2  0  * 

4  iX\  *  Mixsj 

*•  M,  •  !<2 


Each  of  these  minimizations  is  easily  performed  and  it  is  found  that 


V 


•!"7 


-2 


aadefiaed 

^  -fw(3  -  tt j) 


*3f“>  ~ 


I  4  ~  M, 


whe»  Hj  ^  «2  >  2 
otherwise 
when  u2  >  3 
2i«2<  3 

«j^2 

whe«  ll|  S  4 
otherwise. 


The  domain  of  y  is  given  by  the  inequalities  involved  on  the  definitions  of  the 
ij  GO'S  above.  Figure  4  illustrates  D[y]  along  with  the  subregions  defined  by 
these  inequalities.  Note  that 


Vy<3,l)  =  b  -  Ax(3,l) 


GMinjQ-C) 


so  that  the  point  x*  «  (3,1)  maximizes  y.  This  point  could  have  been  found 
using  a  standard  gradient -ascent  technique  that  takes  into  consideration  the 
possible  bounds  on  D[y].  In  many  cases  D[y]  »  E"  so  that  no  special  care 
is  necessary. 

The  following  two  theorems  state  that  problem  1  and  its  auxiliary  prob¬ 
lem  are  equivalent  in  the  sense  that  the  solution  of  one  provides  a  solution  of 
the  other  and  that  their  optimal  values  are  equal.  The  proof  of  Theorem  11 
follows  directly  from  Theorems  7  and  10. 


Theorem  11 

If  y  is  maximized  over  D[y]at  u*  then  x*  *  x(u*)  is  the  solution  of  prob¬ 
lem  1.  Furthermore,  y(u*)  =  0(x*). 

Proof:  Since  y  attains  its  maximum  at  u*  it  is  necessary  that 

^■1“*  *  0  i*"?>0 
r>*  i  o  if„?  =  0 

i.e., 

fj(x*)  =  0  ifut  >  o 
£  0  ifu*  =  0 

so  that  x*  is  feasible.  Moreover 

>  04*)  d(x*)  -  <M*,f{x*)>  =  0(X*) 

so  that  0  attains  its  minimum  at  x*  by  Theorem  4,  and  the 
To  prove  the  converse  of  Theorem  11  we  may  modify 
similar  theorem  found  in  Arrow  et  al.u  It  is  necessary  to 
tional  assumption  that  there  is  a  point  x°  €  C  such  that  f  (x° 


proof  is  complete, 
the  proof  of  a 
make  the  addi- 
)  >  0  (which  im- 


19 


piles  that  (x|f(x)  •  0}  has  a  nonempty  interior  relative  to  C).  This  is  a  com¬ 
mon  assumption  that  is  often  employed  when  dealing  with  concave  inequalities 
(see  Arrow  et  ai.,  p  34). 14  The  assumption  of  strict  convexity  may  be  dropped 
for  this  proof. 

Theorem  12 

Let  0,  -  ft , .  .  .  ,  -fB  be  convex  functions  defined  over  C  and  assume  that 
there  is  a  point  r°  €  C  such  that  f  (x°)  >  0.  If  problem  1  has  a  solution  x*,  then 
its  auxiliary  problem  is  feasible  and  has  a  solution  m*.  Moreover  0(x  ♦)  »  ?(»♦). 
Proof:  Define  two  sets  T  and  T'  in  E*+1  by 

T  -  lfrl):r24(x),f(x)lt  Corson*  xe  Cl 
T'  -  l(r',i#):^(x*)>T'ir>ol. 

It  is  easily  seen  that  T  n  V  *  0.  Since  T  and  T'  are  convex,  there  is  a  hyper¬ 
plane  that  separates  them;  i.e.t  there  is  an  n  +  1  vector  (i/*,  v*)  4  0  such  that 
i/*t  +  (i^,f  >  *  for  all  (r,t)  €  T  and  (t',c')  €  T' 

It  will  now  be  shown  that 

*  o 

v *  <  0. 

Fix  t ,  r '  and  t '  in  the  above  inequality  and  let  r  -  • .  If  v*  <  0  the  inequality 
would  become  violated  for  sufficiently  large  r.  A  similar  argument  fixing 
t,  t ,  and  r '  and  allowing  t  '  to  become  arbitrarily  large  yields  v *  £  0.  More¬ 
over,  v*  t  0  because  if  v*  =*  0  then 

<v*.  f(x°)>  >  0 

since  (0(x°),  f(x°»  6  T,and  (0(x*)  -  1,0)  €  T'.  But  i>*  £  0,  v*  4  0  ((„\  v ♦)  4  0), 
and  f  (x® )  >  0  so  that 

<i*.f(x°)>  <  0, 

which  is  a  contradiction. 


Set 


Thus 


T-  ( u* ,  c)  ’  ( u*,  |*>. 


Since  (0(x),f(x))  €  T  for  any  feasible  x,  it  follows  that 

$(x)-<u*.f(x))':  r'-(uM'). 

Setting  t '  =  0  and  taking  the  supremum  of  the  right-hand  side  gives 

v(x)-<  U*.  f(X)>  '  <>(X*)9 

which  implies  immediately  the  conclusion  of  the  theorem. 
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4.  SOME  APPLICATIONS 


In  this  section  scene  applications  of  the  theory  developed  in  Secs  2  and  3 
are  briefly  described.  A  paper  describing  applications  in  more  detail  is  in 
preparation.  Lasdon9  and  Takahashi9  contain  additional  applications  to  re¬ 
source  and  multistage  allocation  problems. 

Duality 

The  importance  of  duality  theory  in  linear  programming  has  led  to  the 
concept  of  the  dual  of  a  nonlinear  program.  The  formulation  in  this  section 
is  from  P.  Wolfe.9 

The  problem  that  Wolfe  considers  has  the  form 
minimise 

4>(*) 


subject  to 

[(*)  l  o  (9) 

where  0  and  the  m  components  of -f  are  convex  and  continuously  differentiable 
functions. 

The  Wolfe  dual  of  Eq  9  has  the  form 
maximize 

W(x,u)  =  Mx )-<«.  ffx» 


subject  to 

v.tf.,.)-0  (10) 

u  >  0. 

The  feasible  set  of  Eq  10  is  denoted  by  D[*].  Note  that  Eq  10  contains  the 
variable  z  as  well  as  the  dual  variable  u .  In  general  Eq  10  does  not  describe 
a  convex  set. 

Theorem  13 

The  auxiliary  problem  of  Eq  9  is  equivalent  to  Eq  10  in  the  following  sense: 

(a)  Dlf]  =  :  ueDly],  i€\fu)l 

(b)  V(x,u)  =  y(u)  for  each  (x,u)  €  DtV] . 

Proof:  Since  0  and  -  f,  (i  *  1,  .  .  .  ,m)  are  convex,  a  necessary  and  suf¬ 
ficient  condition  that  ¥(*,!<)  be  minimized  over  En  is  that 

Vxf( X,u)  -  0. 

Hence,  if  (x  ,u)  €  D[ty],  then  z  minimizes  *(•,  m)  over  ER  and  conversely.  This 
proves  statement  a.  Statement  b  is  immediate  from  the  definition  of  y . 

While  Eq  10  is  not  a  convex  program,  the  auxiliary  problem  of  Eq  9  is, 
at  least  when  0  is  strictly  convex.  Also  the  primal  variable  z  does  not  occur 
in  the  auxiliary  problem  since  it  has  been  replaced  by  X  («). 
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Decomposable  and  Separable  Programming 
Suppose  Eq  1  has  the  form 


minimize 

*(I)  -  1 

<•1 

subject  to 

f(i)  -  /*(»,)  i  0 

8,  <*,)  i  o  u-  1 

where  0  is  strictly  convex,  each  f‘  and  are  concave,  x  *  (xx , .  .  .  ,  xp)T,  and 
r i  is  a  vector  having  nt  components.  Such  a  problem  is  said  to  be  decomposable 
and  if  Mj  «  1  (i  =  1, .  .  .  ,p)  it  is  said  to  be  separable  (completely  decomposable). 
Letting  C=*  {x  :  g,  (Xj )  *  0} ,  by  Theorem  5,  yields 

y(«)  -  i  mi. I /'(»,)>  :«,(«,)  2  ol. 


Hence,  if  a  gradient-ascent  procedure  is  used  to  maximize  y,  the  essential 
quantities  y(«)  and  Vy(ii)  can  he  obtained  by  solving  p  nonlinear  programs  for 
each  h.  The  ith  program  involves  k,  variables.  Thus  the  solution  of  a  decom¬ 
posable  program  is  obtained  by  solving  p  smaller  subprograms  for  a  sequence 
of  m’s  tending  to  it*.  In  the  separable  case  the  p  subprograms  involve  a  single 
variable  only  and,  in  many  cases,  X£(n)  can  be  expressed  analytically.  In  the 
important  special  case  where  f  has  a  single  component  much  more  can  be  said 
about  the  solution  of  Eq  1  (see  Falktl). 


Minimizing  Quotients 


&ippose  one  is  seeking  the  solution  of  a  problem  having  the  form 
minimize 


subject  to 


X€5 


(11) 


where  0  and  -  *  are  convex,  3  is  closed  and  convex,and  *(x)  >  0  for  all  x  €  S. 
It  will  be  assumed  that  0  is  strictly  convex. 

The  function  6  may  be  defined  over  its  domain  D [6]  by  the  relations 

D[8l  =  If*  £  0  :  0(0  -  ft’KO  has  a  minimum  over  Si 
5 (ft)  =  min  I  <fr(x)  -  nHx)  x  t  S  I 

By  the  theorems  of  Sec  3  it  is  known  that  D[6]  is  open  with  respect  to  (E1)* 
and  convex,  and  6  is  concave. 


Theorem  14 

6  is  a  monotone  decreasing  function. 
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Proof:  Let  n'Sfi'  where  n\ ji#  €  D[6].  Since  *(r)  >  0  for  r  t  S  we  have 

so  that 

«(#*')  l  MXfti'D-p'VfXGO) 

where  X(p')  minimizes  $(•)  -  over  S.  Hence 

*00  *  «(iO 

and  the  proof  is  complete. 

Since  6  is  continuous  and  monotone  decreasing,  the  set  of  points  u*  for 
which  6(p*)  =  0  is  connected  and  compact.  The  next  theorem  characterizes 
the  points  of  this  set. 

Theorem  15 

There  is  a  point  p*  such  that  bin*)  *  0  if  and  only  if  p*  is  the  optimum 
value  of  the  objective  function  of  £q  11.  Moreover  X{y*)  is  a  solution  of  Eq  11. 
Proof:  If  6(jj*)  *  0  it  follows  that 

£  &(n*)  -  0  for  x  e  S. 

Hence 

<£(x)/V( x)  £  u ♦  for  all  reS. 

Moreover 

-  o 

so  that 

-  n*. 

The  other  half  of  the  proof  is  similar. 

Hence  the  problem  of  minimizing  the  quotient  of  two  functions  can  be 
viewed  as  a  sequence  of  minimization  problems  not  involving  quotients.  In 
many  cases,  each  problem  in  this  sequence  of  problems  may  require  a  minimum 
of  computational  effort  compared  to  the  original  problem.  The  sequence  of 
problems  to  be  solved  is  formed  sequentially  in  a  manner  that  will  locate 
a  zero  of  the  concave  decreasing  function  6. 
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